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ABSTRACT 


Based  on  randomly  right-censored  data,  a  smooth  nonparametric  estimator  of  the 
quantile  function  of  the  lifetime  distribution  is  studied.  The  estimator  is 
defined  to  be  the  solution  x  (p)  to  F_(x  (p))=  p,  where  F  is  the  distribution 
function  corresponding  to  a  kernel  estimator  of  the  lifetime  density.  The 
strong  consistency  and  asymptotic  normality  of  *n(p)  are  shown.  Some 
simulation  results  comparing  this  estimator  with  the  product-limit  quantile 
estimator  and  a  kernel-type  estimator  are  presented.  Data-based  selection  of 
the  bandwidth  required  for  computing  Fn  is  investigated  using  bootstrap 
methods.  Illustrative  examples  are  given. 


Key  words;  Right-censoring;  Percentiles;  Asymptotic  normality;  Bootstrap 


method. 
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1.  INTRODUCTION 


Right-censored  data  arise  frequently  in  industrial  life  testing  experiments 

and  medical  studies.  From  such  data  it  is  important  to  be  able  to  obtain  good 

nonparametric  estimates  of  various  characteristics  of  the  unknown  underlying 

lifetime  distribution.  One  characteristic  of  the  lifetime  distribution  that  is 

of  interest  is  the  quantile  function.  For  example,  in  product  development  it' 

is  typical  to  estimate  certain  percentiles  of  the  lifetime  distribution  of  the 

product  based  on  the  right-censored  observations  from  life  tests.  That  is, 

estimates  of  possible  guarantee  times  for  the  product  are  desired. 

For  a  probability  distribution  function  G,  the  quantile  function  is  defined 

by  Q(p)  *  G-1(p)  *  inf  { t :  G(t)  >  p},  0  <  p  <  1.  For  a  random  (uncensored) 

sample  from  G,  several  ncnparametric  estimates  of  Q(p)  have  been  suggested. 

The  sample  quantile  function,  G~1(p)  a  inf  {x:  Gn(x)  >  p} ,  has  been  studied, 

where  Gn(x)  denotes  the  sample  distribution  function  (see  Csorgfi  ,  1983,  for 

example,  for  many  of  the  known  properties  of  G^1).  Another  approach  has  been 

to  solve  G„(x  )  -  p  for  x  .  where  G  (x)  *  g  (t)dt,  with  g  being  a  kernel 
n  p  p  n  -08  ?n  an  3 

estimator  of  the  density  function  of  G  (see  Nadaraya,  1964).  Recently,  Yang 

(1985)  studied  a  kernel-type  estimator  which  smoothed  the  sample  quantile 

function  G~^(p) . 
n 

For  right-censored  data,  Sander  (1975)  proposed  estimation  of  the  quantile 
function  by  the  product-limit  ( PI_ )  estimator,  defined  by  Qn  a  Ffi  ,  where  Fn 
denotes  the  PL  estimator  of  the  lifetime  distribution  (Kaplan  and  Meier,  1958; 
Efron,  1967).  Sander  (1975)  and  Cheng  (1984)  obtained  seme  asymptotic 
properties  of  Qn,  and  Csorgfi  (1983)  discussed  strong  approximation  results. 
Padgett  (1985)  studied  a  kernel-type  quantile  estimator  from  right-censored 
observations,  extending  the  complete  sample  results  of  Yang  (1985).  Lio, 
Padgett,  and  Yu  (1986)  and  Lio  and  Padgett  (1987)  presented  some  asymptotic 
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properties  of  the  kernel-type  estimator,  including  asymptotic  normality  and 
mean  square  convergence.  Also,  the  kernel-type  estimator,  Padgett  and 
Thombs  (1986)  presented  results  _f  extensive  simulations  and  investigated  the 
use  of  bootstrap  methods  for  bandwidth  selection  and  confidence  intervals. 

In  this  paper,  a  smooth  nonparametric  estimator  of  the  quantile  function  is 
studied  which  is  defined  as  the  solution  xn(p)  to  Fn(xn(p))  "  P»  where  Fn(x)  is 
the  distribution  function  corresponding  to  a  kernel  density  estimator  fn(x)  of 
the  lifetime  density  from  right-censored  data.  The  kernel  density  estimator 
proposed  by  Foldes,  Rejto,  and  Winter  (1980)  and  McNichols  and  Padgett  (1986) 
is  used  here.  The  estimator  xn(p)  is  intuitively  more  appealing  than  the 
kernel-type  estimator  of  Padgett  (1986)  since  it  is  a  nondecreasing  function  of 
p.  The  kernel-type  estimate  can  decrease  for  large  p  due  to  the  scarcity  of 
large  uncensored  observations  in  the  sample. 

The  estimator  xn(p'  is  defined  in  Section  2,  and  strong  consistency  and 

asymptotic  normality  are  presented  in  Section  3.  In  Section  4,  some  simulation 

results  are  discussed  comparing  this  estimator  with  the  PL  quantile  estimator 

and  the  kernel-type  estimator  of  Padgett  (1986)  with  respect  to  estimated  mean 

squared  errors.  Bootstrap  methods  for  choosing  a  data-based  bandwidth  value 

for  the  kernel  density  estimate  f  (x)  are  presented  ir.  Section  5.  A  confidence 

n  r 

interval  for  the  true  lifetime  distribution  quantile  based  on  the  bootstrap 
percentile  interval  method  is  also  given  in  that  section. 

2.  NOTATION  AND  PRELIMINARIES 

Let  X°,...,X°  denote  the  true  survival  times  of  n  items  or  individuals  that 
are  censored  on  the  right  by  a  sequence  U^,...,U  ,  which  in  general  may  be 
either  constants  or  random  variables.  The  X?'s  are  nonnegative,  independent, 
identically  distributed  random  variables  with  common  unknown  continuous 
distribution  function  Fq,  unknown  quantile  function  Q°(p)aF~^(p)= 


inf { t : Fq ( t ) >p } ,  0<p<l,  and  unknown  density  function  fQ. 

The  observed  right-censored  data  are  denoted  by  the  pairs  (X^,A^), 
i-1, . . . ,n,  where 


Xi  -  min{X°,  t^},  Ai 


1  if  XT  <  Ui 
0  if  X°  > 


Thus,  it  is  known  which  observations  are  times  of  failure  or  death  and  which 
ones  are  censored  or  loss  times.  The  nature  of  the  censoring  depends  on  the 

U^'s.  (i)  If  V . Un  are  fixed  constants,  the  observations  are  time- 

truncated.  If  all  IL's  are  equal  to  the  same  constant,  then  the  case  of  Type  I 
censoring  results,  (ii)  If  all  IL  -  x°r)'  the  rth  order  statistic  of 
X°,...,X°,  then  the  situation  is  that  of  Type  II  censoring,  (iii)  If 

Ui . Un  constitute  a  random  sample  from  a  distribution  H  (usually  unknown) 

and  are  independent  of  X°,...,X°,  then  (X^.A^),  i»l,2, . . . ,n,  is  called  a 
randomly  right-censored  sample. 

For  the  asymptotic  results  in  Section  3  of  this  paper,  the  random 
censorship  model  (iii)  is  assumed.  For  this  model  the  distribution  function  of 
each  X.  is  F  =  1  -  (1-Fq)(1-H).  This  assumption  is  typically  necessary  for 
asymptotic  results  under  censoring.  For  example,  see  Breslow  and  Crowley 
(1974),  Folces,  F.ejto  and  Winter  (1930),  Fadgett  (1985),  and  Lie,  Padgett  and 
Yu  (1986). 


A  popular  estimator  of  the  survival  function  l-Fo(t)  from  the  censored 
sample  (X^,A^),  i-l,...,n,  is  the  product-limit  (PL)  estimator  of  Kaplan  and 
Meier  (1958).  The  PL  estimator,  which  was  shown  to  be  "self-consistent"  by 


Efron  (1967),  is  defined  as  follows.  Let  (Z^,A^),  i=l,...,n,  denote  the 
order  3d  X^'s  along  with  their  corresponding  A^'s.  Values  of  the  censored 
sample  will  be  denoted  by  the  corresponding  lower  case  letters,  (x^,5^)  and 
(z^,X^),  for  the  unordered  and  ordered  sample,  respectively.  Then  the  PL 
estimator  of  l-FQ(t)  is 
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o<t<zr 


Vt)s 


£  ^r)  *'  Zk-1<tiZk 


t  /  •  •  •  f  ri 


Zn<fc- 


The  PL  estimator  of  FQ(t)  is  denoted  by  Fn(t)«l-P  (t),  and  the  size  of  the  jump 

a  a 

of  Pn  (or  Fn)  at  Zj  is  denoted  by  s^.  Note  that  Sj~0  if  and  only  if  z^  is 

i 

censored  for  j<n,  i.e.  if  and  only  if  X.=0.  Define  S.  *  E  s.  *  F  (Z.  .), 

3  1  ]  n  1+1 

i=l,...,n-l,  and  S^sl . 

A  natural  estimator  of  Q°(p)  is  the  PL  quantile  function 

Qn(p)=inf {t:Fn(t)>p}  (see,  for  example,  Sander  (1975),  Cheng  (1984),  and  Csorgd 

(1983)  for  seme  of  the  properties  of  C^).  Since  Qn  is  a  step  function  with 

jumps  corresponding  to  the  uncensored  observations,  it  is  desirable  to  obtain  a 

smoothed  estimator  of  Q°.  The  kernel  smoothed  considered  by  Padgett 

(1986),  Lio,  Padgett  and  Yu  (1986),  Lio  and  Padgett  (1987),  and  Padgett  and 

Thombs  (1986),  is  such  an  estimator,  and  is  defined  as  follows:  Let  fh=h  }  be 

n 

a  "bandwidth"  sequence  of  positive  numbers  so  that  hn+0  as  n-*»,  and  let  K  be  a 
bounded  probability  density  function  which  is  zero  outside  a  finite  interval 
(~c,c)  and  is  symmetric  about  zero.  (For  asymptotic  results,  other  conditions 
cn  hn>  K,  and  Fq  are  needed,  but  these  are  the  only  assumptions  that  will  be 
made  for  purposes  of  definition.)  Then  for  0<p<l,  the  kernel  quantile  function 
estimator  is  given  by 

Q^p)  -  h_1/jQn(t)K((t-p)/h)dt 


h  1  I  Z./.1  K((t-p)/h)dt, 

i=l  1  bi-l 


(2.1) 


where  S^sO.  An  approximation  to  Qn(p)  was  given  by  Padgett  (1986)  as 


Q_  ( p )  =!*  E  Z.s.K(  (S.-p)/h) 

il  •  i  1  A  1 

1*1 


(2.2) 


Although  neither  estimator  is  difficult  to  compute  (2.2)  will  be  simpler  for 
some  kernel  functions. 

In  this  paper  a  different  smooth  nonparametric  quantile  estimator  than 
(2.1)  is  studied.  It  is  defined  as  the  quantile  function  corresponding  to  the 
distribution  function  obtained  from  a  kernel  smoothed  density  estimate.  To 
define  this  estimator  from  the  ordered  censored  data  (Z^A^),  i=l,...,n, 
consider  the  kernel  density  estimator  of  f  written  by  McNichols  and  Padgett 
(1986)  as 

-1  n 

f  (t)  -  h  1  I  s.K((t-Z.)/h),  t>0, 
n  j=l  J  D 

where  h  and  K  are  the  bandwidth  and  kernel  function  defined  earlier.  The 
distribution  function  corresponding  to  the  density  f  can  be  written  as 

Fn(x>  *  fL  fn(t)dt  =  Jq  W((x-t)/h)dFn(t) 
n 

«  I  s .  W( (x-Z  . )/h) ,  (2.3) 

j-1  ]  3 

where  W(t)  **  K(u)du  is  the  distribution  function  for  the  kernel  K.  Then 
the  estimator  xn(p)  of  the  pth  quantile,  Q°(p),  is  defined  to  be  the  solution 
to  the  equation  Fn(x)=p.  This  solution  can  be  found  iteratively  by  the  Nevton- 
Raphson  method,  for  example,  using  a  starting  value  such  as  the  PL  quantile, 
Qn(p).  In  all  computations  reported  in  this  paper,  the  iterations  converged 
rapidly. 

Although  the  estimate  x  (p)  must  be  obtained  by  an  iterative  procedure, 
whereas  Qn(p)  can  be  calculated  directly,  xn(p)  is  more  appealing  since  it  is 
always  a  nondecreasing  function  of  p.  The  estimate  Q  ( p )  can  decrease  for 
large  p  (see  Figure  1  of  Padgett,  1986).  This  is  due  to  the  scarcity  of 
uncensored  observations  from  the  tail  of  Fq  and  can  possibly  be  avoided  by 
appropriately  increasing  the  bandwidth  h  to  compensate  for  fewer  observations. 
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However,  this  would  tend  to  oversmooth  the  estimate.  A  small  computer 
simulation  study  to  be  discussed  in  Section  4  will  indicate  some  further 
comparison  of  xn(p)  and  Q^p)  for  small  sample  sizes.  In  fact,  the  simulation 
results  indicate  that,  while  both  xn(p)  and  Qn(p)  are  better  than  Q^p)  in  the 
sense  of  smaller  estimated  mean  squared  errors  for  a  range  of  bandwidth  values. 


neither  performs  uniformly  better  than  the  other. 


3.  SOME  ASYMPTOTIC  RESULTS 


In  this  section,  the  consistency  and  asymptotic  normality  of  the  estimator 


xn(p)  will  be  presented  assuming  the  random  right-censorship  model.  The 
results  will  be  stated,  and  their  proofs  will  be  outlined  in  the  appendix.  The 


consistency  will  be  stated  first. 


For  any  distribution  function  G,  define  TG  a  sup  {t:  G(t)  <  1} 


Theorem  1.  Suppose  Fq  is  continuous  and  strictly  increasing  on  [0,®).  If 


either  (i)  H(TF  )  <  1,  where  t  denotes  limit  from  the  left,  or  (ii)  T  < 
o  o 


Tjj  <  ®,  then  xn(p)-*Q  (p)  almost  surely  as  n-*°. 


The  conditions  H(T  )  <  1  and  T_  <  T..  <  ®  guarantee  a  positive 


probability  of  observing  uncensored  data  points  from  the  entire  support  of  the 


lifetime  distribution  Fo.  The  condition  in  (ii)  allows  both  the  lifetime 


distribution  and  censoring  distribution  to  have  the  same  sucoort  and  to  have 


support  equal  to  the  interval  [0,®).  Hence,  Fo  and  H  can  both  be  exponential, 


Weibull,  or  gamma  distributions,  for  example. 


As  the  next  theorem  states,  x^(p)  has  the  same  limiting  normal  distribution 


as  Qn(p)  (Lio,  Padgett  and  Yu,  1986).  The  proof,  given  in  the  Appendix,  uses 


the  concepts  of  Kiefer  processes  (see  Csorgfi,  1983). 
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Theorem  2.  Let  T  satisfy  1-F(T)*[1-F  (T) ][1-H(T) ]>0.  Assume  that  Q°(p)<T, 
fQ( 1-H)  is  continuous  and  positive  at  Q°(p),  the  density  function  of  H  is 
continuous,  and  K  is  a  continuous  density  defined  on  the  finite  interval 
[-c,c].  If  h-MD,  nh-*°,  and  /nh->0  as  n-**>,  then  /n [ xn ( p )  — Q° ( p )  ]-»Z  in 
distribution,  where  is  a  normally  distributed  random  variable  with  mean  zero 
and  variance 

ol  -  ( 1— p ) 2 [ f  ( Q° ( p ) ) ]2  fQ°(p)  [ 1— F ( u )  2  dF  (u) , 

P  u  '  g  ° 

with  F  (u)  «  P(X<u,A=l)  denoting  the  subdistribution  function  of  the  uncensored 
observations. 

An  example  of  a  bandwidth  sequence  {h-n}  satisfying  the  conditions  of 

Theorem  2  is  h  *cn  ,  ^<d<l. 
n 

4.  SOME  SIMULATION  RESULTS 

A  small  Monte  Carlo  simulation  was  performed  to  obtain  an  indication  of  the 
performance  of  *n(p)  compared  with  the  kernel-type  estimator,  Q^p),  for  small 
sample  sizes.  For  these  simulations  the  triangular  density  on  [-1,1]  was  used 
for  K,  K(u)=l- |u | ,  | u ] <1 .  The  censoring  distribution  K  was  taken  to  be  the 
exponential  distribution  with  mean  and  the  lifetime  distributions  used  were 
Weibull  with  shape  parameter  o  and  scale  parameter  equal  to  one,  that  is, 

F  (x)  -  1-e  x  ( a=0 .5,1,  and  2). 

The  bandwidth  values  of  h=0.0i  (0.04)  0.61  were  used  for  quantiles  at 
p=  0.10,  0.25,  0.50,  0.75.  Sample  sizes  of  n=  30,60,100  were  studied. 

In  each  case  simulated  (i.e.  each  distribution,  bandwidth,  p,  and  sample 
size  combination),  300  censored  samples  were  generated  using  the  random  number 
generators  in  the  International  Mathematical  and  Statistical  Library  (IMSL, 


1985)  on  a  VAX  11/8300  computer.  From  the  300  samples  the  estimated  mean 
squared  errors  (Average  Squared  ErrorsASE)  of  the  estimators  xn(p),  Qn(p)  and 
Qn( p )  were  computed,  and  the  ratios  of  these  ASE's,  ASE[Qn(p) ]/ASE[x  (p) ]  and 
ASE(Qn(p) ]/ASE[xn(p) ] ,  were  calculated. 

Some  of  the  results  of  the  simulations  are  given  in  Tables  1-3.  In  all 
cases  for  each  p,  except  for  small  p  for  the  Weibull  lifetime  distribution  with 
06=0 . 5 ,  there  is  a  range  of  bandwidth  values  for  which  x  (p)  has  smaller  ASE 
than  that  of  the  PL  quantile  estimator,  Q  (p).  Also,  in  many  of  the  simulated 
cases,  there  is  a  range  of  h  values  for  which  the  ASE  of  x  (p)  is  less  than 
that  of  the  kernel  estimator,  Q  (p).  This  is  the  case  for  the  exponential 
lifetime  distribution  for  all  values  of  p  simulated.  However,  neither  x  (p) 
nor  Q^(p)  is  uniformly  better  than  the  other  over  all  values  of  p  and 
bandwidths  used  in  the  simulations. 

In  the  next  section  the  bootstrap  will  be  used  to  determine,  based  on  the 
given  censored  sample,  the  "best"  value  of  h  to  use  (in  the  sense  of  the 
smallest  bootstrap  MSE)  in  calculating  x  (p)  as  p  varies.  Bootstrap  confidence 
bounds  for  the  true  quantile  Q°(p)  will  also  be  discussed,  and  an  example  using 
switch  failure  data  adapted  from  Nair  (1984)  will  be  presented. 

5.  BOOTSTRAP  METHODS :  BANDWIDTH  SELECTION  AND  CONFIDENCE  INTERVALS 

Since  the  estimator  x  (p)  is  imolicitly  defined  as  the  solution  to  F  (x  (p)) 

n  r  J  n  n  r 

=  p,  where  F  (x)  =  Jx  f  (t)dt,  it  deoends  on  a  bandwidth  value  h  used  in  the 
r  n  n  -  n 

kernel  smoothed  densitv  estimate  f  (t).  Thus,  in  practice  h  must  be  chosen 

*  n  r  n 

before  the  estimator  x  (p)  can  be  ccmouted.  A  natural  cuestion  to  ask  is: 

"Which  bandwidth  value  yields  the  'best'  estimate  x  (p)  of  Q°(p),  in  the  sense  of 
minimum  mean  square  error  (MSE)?"  Due  to  the  censoring,  exact  or  even  asymptotic 
expressions  for  MSE(xn(p))  are  difficult  to  derive.  In  this  section  we  propose  a 
method  of  selecting  bandwidths  based  on  minimizing  the  bootstrap  estimate  of 
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TABLE  1.  Ratios  of  Average  Squared  Errors  (300  samples) 
Exponential  (\=1)  Life  Distribution  and 
Exponential  (6=3/7)  Censoring  Distribution  (30%  censoring) 

n=30 


p  \  h 

.01 

.05 

.09 

.13 

.17 

.21 

.25 

.29 

.33 

.37 

.45 

.57 

.10 

a 

.127 

1.016 

1.028 

.965 

1.035 

1.044 

1.024 

1.206 

1.126 

.867 

.794 

.611 

b 

.123 

1.082 

1.046 

.940 

.909 

.880 

.951 

1.124 

1.152 

1.370 

1.923 

1.742 

.25 

a 

.226 

.026 

1.027 

1.121 

1.183 

1.105 

1.192 

1.119 

1.332 

1.197 

1.345 

1.238 

b 

.216 

.235 

.958 

.938 

.995 

.988 

.976 

1.040 

1.020 

1.093 

1.295 

1.689 

.50 

a 

1.008 

1.048 

1.061 

1.044 

1.055 

1.092 

1.093 

1.136 

1.134 

1.174 

1.156 

1.241 

b 

.972 

.957 

.932 

.930 

.960 

.976 

1.032 

1.088 

1.016 

1.052 

1.538 

1.645 

.75 

a 

1.005 

1.021 

1.022 

1.033 

1.068 

1.086 

1.112 

1.074 

1.191 

1.149 

1.177 

1.205 

b 

.865 

.938 

1.053 

.941 

.839 

.964 

.786 

.864 

.911 

.595 

.515 

.536 

n=50 

.10 

a 

1.018 

1.116 

1.136 

1.146 

1.366 

1.204 

1.163 

1.140 

1.060 

1.030 

.745 

.266 

b 

1.018 

.978 

.958 

.943 

.938 

.982 

1.022 

1.260 

1.287 

1.484 

1.725 

1.604 

.25 

a 

1.006 

1.033 

1.102 

1.111 

1.178 

1.198 

1.274 

1.244 

1.276 

1.290 

1.295 

1.248 

b 

.979 

1.001 

.977 

.992 

.999 

.999 

.969 

1.030 

1.082 

1.102 

1.325 

2.650 

.50 

a 

1.007 

1.033 

1.040 

1.093 

1.134 

1.119 

1.155 

1.147 

1.129 

1.234 

1.205 

1.285 

b 

.974 

.976 

.956 

.964 

.945 

.994 

.991 

1.046 

1.040 

1.205 

1.385 

2.659 

.75 

a 

1.003 

1.027 

1.029 

1.063 

1.034 

1.069 

1.100 

1.133 

1.174 

1.165 

1.193 

1.203 

b 

.959 

.870 

.962 

.953 

1.080 

1.002 

.962 

1.256 

1.076 

.948 

.817 

.859 

n=100 

.10 

a 

1.024 

1.125 

1.215 

1.291 

1.265 

1.267 

1.275 

1.326 

1.177 

.732 

.559 

.227 

b 

.980 

.933 

.953 

.951 

.923 

.993 

1.084 

1.666 

1 . 875 

1.521 

1.853 

1.682 

a 

1.017 

1.033 

1.041 

1.228 

1.184 

1.205 

1.220 

1.212 

1.317 

1.294 

1.253 

1.189 

b 

.938 

.991 

.983 

.977 

.983 

1.013 

1.022 

1.059 

1.121 

1.264 

1.524 

2.897 

.50 

a 

1.009 

1.014 

1.035 

1.059 

1.107 

1.173 

1.132 

1.064 

1.218 

1.168 

1.238 

1.243 

b 

.935 

.996 

.938 

.955 

.971 

.971 

1.018 

1.079 

1.294 

1.253 

1.924 

3.112 

7  ^ 

a 

1.007 

1.014 

1.033 

1.089 

1.072 

1.078 

1.0S8 

1.093 

1.089 

1.209 

1.248 

1.202 

b 

.954 

.943 

.906 

.879 

.934 

1.202 

1.459 

1.617 

1.153 

.925 

.816 

1.419 

a.  ASE(Qn)/ASE(x^) 

b.  ASE(Q^/ASE(x  ) 


.10 

a 

.795 

.636 

.443 

.443 

.507 

.486 

.280 

.231 

.114 

.074 

.034 

.028 

b 

1.229 

1.049 

.946 

1.006 

.968 

.975 

.821 

1.009 

.721 

.725 

.925 

1.087 

.25 

a 

1.021 

1.120 

1.122 

1.127 

1.196 

1.129 

1.024 

1.091 

.931 

.787 

.818 

.674 

b 

1.010 

.963 

1.025 

1.099 

1.163 

1.220 

1.241 

1.523 

1.586 

2.081 

3.183 

6.655 

.50 

a 

1.006 

1.019 

1.037 

1.050 

1.064 

1.076 

1.128 

1.121 

1.152 

1.168 

1.143 

1.174 

b 

.966 

.977 

1.009 

.993 

1.123 

1.226 

1.436 

1.443 

1.748 

2.131 

3.449 

4.194 

.75 

a 

1.002 

1.010 

1.018 

1.022 

1.031 

1.036 

1.052 

1.051 

1.071 

1.075 

1.086 

1.133 

b 

.971 

.866 

.898 

.875 

.778 

.647 

.608 

.596 

.579 

.452 

.347 

.265 

n=60 

.10 

a 

.827 

.650 

.652 

.552 

.391 

.317 

.109 

.073 

.051 

.045 

.032 

.060 

b 

1.070 

.895 

.860 

.889 

.781 

.913 

.534 

.558 

.459 

.552 

.624 

.623 

.25 

a 

1.007 

1.029 

1.125 

1.047 

.959 

.831 

.851 

.687 

.614 

.588 

.509 

.401 

b 

1.020 

1.070 

1.046 

1.084 

1.174 

1.116 

1.234 

1.360 

1.741 

1.947 

3.126 

9.496 

.50 

a 

1.003 

1.010 

1.024 

1.063 

1.052 

1.135 

1.065 

1.175 

1.086 

1.167 

1.137 

1.119 

b 

.991 

.988 

.925 

1.033 

1.127 

1.341 

1.762 

1.902 

2.023 

3.505 

4.118 

11.417 

.75 

a 

1.003 

1.010 

1.021 

1.030 

1.032 

1.054 

1.040 

1.065 

1.081 

1.073 

1.117 

1.141 

b 

.865 

.961 

.782 

.759 

.819 

.794 

.833 

.756 

.779 

.436 

.396 

.234 

a.  ASE(Qn)/ASE(xn) 

b.  ASE(Qn)/ASE(xn) 


.939  1.081 
.872  .904 


1.220 

.801 


1.226  1. 
.745  . 


318  1. 
664  . 


386  1.565  1.531  1.348  1.417  1.052 
685  .593  .564  .544  .451  .306 


.708  .429 

.678  .404 


1.083 

.956 


1.125 

.966 


1.180  1 
.950 


.156  1, 
.946  . 


312  1.287  1.393  1.290  1.493  1.464 
899  .883  .866  .855  .780  .702 


1.012  1.015 
.991  .990 


1.114 

.978 


1.118 

.947 


1.104  1, 
.957  , 


161  1, 
976  , 


161  1.276  1.316  1.326  1.243  1.302 
958  .963  .940  .928  .987  .906 


1.012  1.023 
.967  1.001 


1.065 

.958 


1.091 

.937 


1.109  1. 
.872  . 


165  1, 
978  . 


168  1.220  1.292  1.415  1.532  1.345 
924  .863  .913  .820  1.276  2.537 


1.030  1.125 
.981  .930 


1.210 

.876 


1.267 

.821 


1.454  1 
.791 


.373  1 
.716 


.400  1.394  1.394  1.374  1.123  .536 
.776  .738  .650  .542  .371  .163 


a  1.009  1.043 


.997  .995 


1.104 

.965 


1.097 

.984 


1.165  1, 
.939  , 


257  1 
927 


.289  1.294  1.209  1.365  1.340  1.220 
.905  .887  .902  .890  .841  .749 


1.000  1.002 

.993  .984 


1.076 

.961 


1.088 

.972 


1.149  1. 
.953  . 


207  1 
979 


.189  1.203  1.220  1.293  1.303  1.439 
.977  .976  .962  .963  .948  .927 


1.000  1.027 
1.005  .965 


1.095 

.959 


1.128 

.957 


1.111  1 

.954 


.151  1 
.984  1 


.144  1.227  1.217  1.247  1.350  1.165 
.025  .951  .914  .929  2.011  4.094 


a.  ASE(Cn)/ASE(xn) 


b.  ASE(Qn)/ASE(xn) 


-  ’  f  "  J-  -V 
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Mar  ron  and  Padgett  (1987)  have  determined  an  asymptotically  optimal  bandwidth 
for  the  density  estimate  f  (t)  which  minimizes  the  integrated  squared  error  (ISE) 
of  fn<  Since  their  bandwidth  value  is  based  on  minimizing  the  (asymptotic) 
global  ISE  of  fn,  it  does  not  work  well  in  the  setting  of  quantile  estimation  for 
small  samples.  Bandwidth  values  for  the  quantile  estimates  considered  here 
depend  on  p,  and  results  regarding  bandwidth  selection  for  fn(t)  with  respect  to 
the  asymptotic  minimum  ISE  criteria  do  not  carry  over  to  the  estimator  xn(p). 

Recently,  the  scope  of  the  bootstrap  has  been  extended  from  the  iid  case  to 
include  more  complex  data  structures  such  as  censored  data  and  correlated  data 
(see  Efron  and  Tibshirani,  1986).  By  creating  bootstrap  replicates  which  are 
intended  to  "mimic"  the  statistical  properties  of  the  sample  (and  thus  the 
population)  one  can  learn  about  the  sampling  distribution  of  a  statistic, 
regardless  of  its  complicated  form.  In  this  paper  the  nonpar ametric  bootstrap 
for  censored  data  is  used  to  investigate  the  MSE(xn(p))  as  a  function  of 
bandwidth . 

Recall  that  (X^,A^),  i=l,...,n,  denotes  the  observed  censored  sample.  Unlike 

the  iid  case,  there  is  not  a  well-defined  method  for  obtaining  a  bootstrap 
★  ★ 

replicate  (X^,/^),  i=l,...,n.  There  have  been  several  proposed  methods  for 
resampling  censored  data.  Reid  (1981)  proposed  resampling  from  the  Kaplan-Meier 
estimator  Fn  of  F  ,  which  results  in  bootstrap  samples  that  contain  only 
uncensored  observations.  In  Efron's  (1981)  plain,  one  simply  takes  a  random 
sample  with  replacement  from  (X^,fl^ ),...,  (X  , A  ).  While  Reid's  approach  is 
analogous  to  resampling  in  the  uncensored  case,  it  is  not  clear  what  is  being 
estimated  by  the  bootstrap  since  no  censored  observations  are  present  in  any  of 
the  bootstrap  replicates.  Akritas  (1986)  studied  the  asymptotic  properties  of 
Efron's  and  Reid's  procedures  for  bootstrapping  censored  survival  data.  He 
showed  that  Efron's  approach  yields  asymptotically  correct  confidence  bands  for 


Fq  based  on  the  Kaplan-Meier  estimate  Fn,  while  Reid's  does  not.  Since  xn(p) 
involves  F  ,  we  have  adopted  Efron's  approach  of  drawing  at  random,  with 
replacement,  from  the  n  data  values  to  get  (X*,A*),  i-l,...,n. 

For  fixed  p  and  h,  we  define  the  bootstrap  estimate  of  MSE(xn(p)  as  follows: 
For  each  bootstrap  replicate,  the  quantile  estimate  x*(p)  is  calculated.  As  is 

usually  the  case,  a  large  number,  B,  of  bootstrap  samples  and  the  corresponding 

* 

estimates  xn(p)  are  obtained.  The  bootstrap  estimate  of  variance  is  given  by 

B 


Vat'  (xn(p))  -  jij 


(5.1) 


★  1  if 

where  (p)  denotes  the  estimate  x  (p)  calculated  from  the  ith  bootstrap 

replicate  and  x  (p)  -  I  x  (p)/B.  The  bias  estimate  is 
n  i-1  n 

Bias*(xn(p))  -  x*(p)-xn(p),  (5.2) 

where  xn(p)  is  the  estimate  calculated  from  the  original  data.  Then  for  a 


bandwidth  value  h  and  fixed  p,  the  bootstrap  estimate  of  MSE(xn(p))  is 

MSZ*(h)  «Var*(xn(p))  +  [bias*(xn(p) ) ]2. 

★  ★ 

The  value  h  which  yields  minimum  MSE  (h)  is  the  bandwidth  selected  to  calculate 

Vpl- 

Cnee  the  bandwidth  h*  has  been  selected,  the  set  of  B  values  x*1(p),..., 
xn  (p)  corresponding  to  h  can  be  used  to  construct  a  confidence  interval  for 
Q°(p).  Define  the  bootstrap  cumulative  distribution  function  of  x  (p)  by 

G*(y)  -  (number  of  values  x*i(p)  <  y}  /  B.  (5.3) 

Then  the  endpoints  of  the  interval  are  quantiles  of  G  ;  that  is,  a  100(l-a)% 
confidence  interval  for  Q°(p)  is  given  by 

[G  (a/2) ,  G  (l-a/2)].  (5.4) 

Note  that  this  is  an  application  of  Efron's  (1980)  percentile  interval  method.  A 
refinement  of  this  method,  called  the  "bias-corrected  acceleration  constant 
percentile  interval,"  has  recently  been  proposed  by  Efron  (1987)  but  was  net 
considered  here. 


To  illustrate  the  performance  of  bootstrap  bandwidth  selection  and  confidence 

intervals,  a  sample  of  size  n=60  was  generated  using  an  exponential  (mean»l) 

lifetime  distribution  and  exponential  (mean*l)  censoring  distribution.  The 

triangular  density  on  [-1,1]  was  used  as  the  kernel  function  K(u)  for  fn. 

Bandwidth  values  h=.02( .02) .60  were  considered  and  quantile  estimates  for 

p-.025(  .025)  .975,  as  well  as  p=.01  and  .99,  were  studied.  For  the  calculation  of 

bootstrap  MSE,  B-300  was  used.  Some  results  are  given  in  Table  4  and  Figure  1. 

Note  that  the  values  of  h*  are  small  for  small  p,  increase  for  p  up  to  about 

0.75,  and  then  tend  to  decrease  for  larger  p.  For  p-.lO,  Figure  2  shows  a  graph 
★ 

of  bootstrap  MSE  (h)  vs.  h.  The  general  quadratic  shape  of  the  MSE  curve  is 

obvious.  Table  4  indicates  that  the  estimator  xn(p)  is  often  very  close  to  the 

product-limit  estimator,  but  when  the  two  differ,  xn(p)  is  closer  to  the  true 

quantile.  In  Figure  1,  the  close  agreement  of  xn(p)  to  Q°(p)  for  all  p  is 

illustrated.  Note  that  the  confidence  bands  become  wider  as  p  increases.  This 

was  the  case  in  all  the  simulations  and  is  probably  a  result  of  the  random  right 

censoring  present  in  the  data.  In  order  to  better  estimate  the  sampling 

distribution  of  xn(p),  B=1000  was  used  in  obtaining  confidence  bounds. 

Next,  as  an  application  of  the  bootstrap  bandwidth  selection  procedure,  we 

consider  mechanical  switch  failure  data  adapted  from  Nair  (1984)  (see  Table  5). 

The  triangular  kernel  was  used  and  estimators  were  obtained  for  p-.05( .05) .95. 

★ 

Table  6  gives  the  estimates,  bootstrap  band-widths,  and  MSE  values  for  some 
values  of  p.  The  quantile  estimate  and  confidence  bands  are  plotted  in  Figure  3. 


TABLE  4.  Bootstrap  Bandwidth  Selection  for  xn(p) 

Exponential  Lifetime  Distribution  (mean-1)  and 
Exponential  Censoring  Distribution  (mean«l). 


p 

★ 

h 

xn'P> 

Q°(p) 

.01 

.02 

-.001 

.001 

.010 

.10 

.26 

.101 

.053 

.105 

.25 

.06 

.310 

.311 

.288 

.50 

.53 

.757 

.755 

.693 

.75 

.54 

1.523 

1.521 

1.386 

.90 

.58 

2.344 

2.106 

2.303 

.99 

.40 

4.627 

4.491 

4.605 

TABLE  5.  Failure  Times  (in  Millions  of  Operations)  of  Switches 


2.119 

2.135 

2.197 

2.199 

2.227 

2.250 


1.667 

1.695 

1.710 

1.955 

1.965 

2.012 

2.051 

2.076 

2.109 

2.116 


TABLE  7.  Quantile  Estimates  for  Switch  Data 


5 

.10 

1.65 

5 

.24 

2.16 

0 

.60 

2.58 

5 

.44 

3.01 

sV.vW/% 
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APPENDIX:  PROOFS  OF  THEOREMS 

Denote  Q°(p)  a 

Proof  of  Theorem  l(i).  For  fixed  0<p<l,  write 

lF0<x„<p>>  - 

-  lF0(xn(P,)-E'n(xn(P),l 

<  sup  |F(x)-F(x)|.  (a. 1 ) 

X 

The  right-hand  side  of  (A.l)  converges  to  zero  almost  surely  by  Theorem  5.3  of 
Foldes,  Rejto  and  Winter  (1980).  Since  F~^  is  continuous,  then  xn(p)  ->  almost 
surely.  /// 

To  prove  part  (ii)  of  Theorem  1,  the  following  lemma  is  needed.  Let  Kn(*)  a 
W(-/hn)  and  Kn(y;x)  *  l-Kn(x-y). 

Lemma.  Let  F  be  continuous  and  T_  <TU<  °°.  Then  sud  If  (x)-F  (x)  I-0 
-  °  Fo‘  H"  -»<x<Tf  n  ° 

almost  surely.  0 

Proof.  By  Corollary  2 ( ii )  of  Csorgfi  and  Horvath  (1983), 

sup  |F  (x)-F  (x)  |->0  almost  surely.  From  Lemma  5.2  of  Foldes,  Rejto  and 
-®<x<Tf 

o 

Winter  (1930)  ar.d  by  definition  of  TF  ,  letting  Fn(x)»J‘Fo(y)dKn(y;x) , 

o 

lFn(x)'Fn(x)  I  "  UFn(y)d*n(y;x)-Fn(x)  I 
I  T 

-  /  F°  tFn(y)_Fo(y) ld*n(y;x) 

I  —CD 

„  T 

<  sup  |F(x)-F(x)|  f  Fo  dK  (y;x) 

-«<x<TF  n 

o 

<  sup  |f  (x)-f  (x) J . 

-°Kx<Tf 


Thus,  sup  |F  (x)-F  (x) |-Hj  almost  surely. 

-“KX<Tf 

o 

Now,  given  e>0,  let  6>0  be  such  that  |y|<5  implies  sup  |F  (x-y)-F  (x)|<e. 

x  ° 


|Fn(x)-Fo(x)|  -  |  J  Fo(x-y)dKn(y)-;Fo(x)dKn(y)| 
<  c  +  dK  (y) • 

|y  <s  n 


(A. 2) 


But  as  n-*»  the  last  integral  in  (A. 2)  approaches  zero,  so  sup|F  (x)-F  (x)  |->0. 

x  n  0 

The  result  of  the  lemmc  now  follows  by  the  triangle  inequality.  /// 


Proof  of  Theorem  1  ( ii ) .  The  result  follows  from  the  Lemma,  the  continuity  of 
F~^,  and  writing 

lFoUn<t,l)-Fo(^)l  ’  lFo<*n(n))-el 

<  lFn<yp)>-F0(yp>>l 

<  sup  ( F  ( x ) — F  ( x )  | .  /// 

-®<x<Tr  ° 

"  Fo 

Proof  of  Theorem  2.  Approximating  Fn<xn(p))“P  by  its  two-term  Taylor 
expansion  about  £^,  write 

^  rr^r  <A-3> 


where  £  is  some  (random)  point  between  *n<p)  and  £^. 

Now,  |fn(  £)-fQ(  £^)  |  <  |  fn<  ^)  — f Q(  £.)  |  +  1  fQ(  0-fQ(  I  •  Under  the  assumptions 

of  the  theorem,  by  Corollary  2  of  Mielniczuk  (1986), 

sup  |f  (x)-f  (x)  |-*0  almost  surely. 

0<x<T  n  0 

From  Theorem  1,  xn(p)*^  almost  surely,  so  almost  surely.  Hence,  by  the 
continuity  of  fQ,  |fn(£)-fQ(  £^)  |-*0  almost  surely.  Therefore,  (A. 3)  has  the  same 
limiting  distribution  as  /n  ( Fq( £^)-Fn(  £^) ] . 

Now,  consider  /n(  Fn(  ^)-FQ(  E^)  ]  -  I  +  II,  where  I  -  /n[  F^(  £^)-Fn(  £^)  ]  and  II 
-  /n[Fn(^)-Fo(^)].  Next,  write 


I  -  /n 


[  #  c  h-1K(  (u-x)/h)dFn(x)du  -  Fn(^)j 

/n  [  1°  ^  f  h-1K(  (u-x)/h)  du  dF  (x)  -  dF  (x)l. 

L  Jy-rh  n  '  ft  n  -I 


Since  f  h  ^K( (u-x)/h)du  <  1,  for  n  sufficiently  large, 
+ch  - 

h  K((u-x)/h)du  <  1,  so 
-ch 


I  <  /n 


/5[Fn<£  +  ch) 


(A.4) 


Next,  letting  K(t,s)  denote  a  Kiefer  process  and  (Sn(t)  s  /n[Fn( t)-FQ( t) ]  be 
the  PL  process  (see  Csorgfl  1983,  for  these  definitions),  the  right  side  of  (A.4) 
can  be  written  as 

[Sn(^+ch)-n-l5K(^+ch,n)] 

-  l8n(^)-n_liK(^,n)] 

+  n-!s[K(^+ch,n)-K(^,n)] 

+  nJi[Fo(^+ch)-F0(^)].  (A.  5) 

From  Theorem  A  of  Aly,  Csorgtt,  and  Horvath  (1985),  under  the  assumptions  of  the 
theorem,  the  first  two  terms  of  (A. 5)  converge  to  zero  as  n-*».  Also,  by  a  proof 
similar  to  that  of  Lemma  1  of  Lio,  Padgett,  and  Yu  (1986),  the  third  term  of 
(A. 5)  converges  to  zero  in  probability  as  n-*°.  For  the  last  term  of  (A. 5),  since 
fQ(£°)>0  by  assumption, 


ch/H[^ixVo 

as  n-**>  since  /nh-*0  by  the  conditions  of  the  theorem.  Therefore,  l-»0  in 
probability  as  n-*».  Hence,  /n[ Fn( E^)-Fq( £^) ]  has  the  same  well-known 
limiting  distribution  as  II,  which  is  the  normal  distribution  with  mean  zero  and 


variance 


,-p,2  r 


P  [ l-F(u) ]~2 


dF  (u),  completing  the  proof  ./// 
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